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Abstract: We propose a new method to construct confidence intervals for quantities 



J>^' tliat are associated witli a stationary time series, wliicli avoids direct estimation of tlie 

asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our 



method has the attractive convenience of being free of choosing any user-chosen number 
or smoothing parameter. The interval is constructed on the basis of an asymptotically 
distribution-free self-normalized statistic, in which the normalizing matrix is computed 
using recursive estimates. Under mild conditions, we establish the theoretical validity of 
our method for a broad class of statistics that are functionals of the empirical distribution 
of fixed or growing dimension. From a practical point of view, our method is conceptually 
simple, easy to implement and can be readily used by the practitioner. Monte-Carlo 
simulations are conducted to compare the finite sample performance of the new method 
f^ ' with those delivered by the normal approximation and the block bootstrap approach. 
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1 Introduction 

In time series analysis, constructing a confidence interval for an unknown quantity is often 
difficult owing to dependence. For example, for a stationary time series {Xt)tez, suppose 
the quantity of interest, 6, is the median of the marginal distribution of Xi (which is 
denoted as med{Xi)). Once we observe the data (Xi, ■ ■ ■ , X„), a natural estimator of 6 is 
9n = med{Xi, ■ ■ ■ ,X„). Under suitable weak dependence conditions, one can show that 

V^{en-0)^DN{O,a^), (1) 

where "— J-d" stands for convergence in distribution, 

oo 

a^ = {4g\e)}-' Yl ^o^{l - 21(Xo < ^), 1 - 21(Xfe < 6)} 

fc=— oo 

with g{-) being the density function of Xi; see Biihlmann (2002). To establish a confi- 
dence interval for 6 based on expression ([1]), we need to find a consistent estimate of a^, 
which boils down to estimating g{6) and the long-run variance of the transformed series 
1 —21{Xt < 9). The consistent estimates of these two quantities both involve a smoothing 
parameter, the choice of which has been extensively studied in the literature. However, no 
empirical investigation seems to be done along this line, in part because there are more 
appealing alternatives, such as the moving block bootstrap method (Kiinsch 1989; Liu and 
Singh 1992) and the subsampling approach (Politis, Romano and Wolf 1999). The re- 
sampling techniques are powerful in that they bypass direct estimation, and the resulting 
confidence intervals have asymptotically correct coverage probability under appropriate 
conditions. However, a practical drawback that is associated with these methods is that 
they all require the selection of a user-chosen parameter, such as the block length in the 
moving block bootstrap, and the window width in the subsampling method. The em- 
pirical coverage probability can be sensitive to the choice of these user-chosen numbers. 
Although the methods that address the optimal choice of the tuning parameters are avail- 
able (Hall, Horowitz and Jing 1995; Politis et al. 1999), they are usually rather ad hoc, or 
involve another user-chosen number and require very expensive computation. The confi- 
dence interval can also be constructed by using the blockwise empirical likelihood method 
(Kitamura 1997), but again one must deal with the issue of block size selection. 

In this paper, we propose a new approach to constructing confidence intervals (regions) 
for a large class of quantities that are encountered in time series analysis. The method 



does not involve any user-chosen numbers and yields a confidence interval that has asymp- 
totically correct coverage. The interval is constructed on the basis of a self- normalized 
statistic, where the normalization matrix is formed by using recursive estimates. The 
self- normalized method proposed is an extension of Lobato (2001) from the sample au- 
tocovariances to more general approximately linear statistics. It also relates to recent 
work on fixed-b asymptotics in the econometrics literature (Kiefer, Vogelsang and Bunzel 
2000; Kiefer and Vogelsang 2002b, 2005 among others). As an important methodological 
contribution, the new approach can be used to construct confidence intervals and to test 
hypotheses based on the approximately linear statistic that has a non-differentiable infiu- 
ence function (e.g. the sample median), to which all the aforementioned works are not 
directly applicable. Further, it can be extended to confidence interval construction for the 
quantity that is a functional of the joint distribution of {X^jteZj which is of interest in 
spectral analysis. 

We now introduce some notation. For a column vector x = (xi, ■ ■ ■ , Xq)' G M.'^, let |x| = 
(E?=ia;|)^/^- Let ^ be a random vector. Write ^ E C^ {p > 0) if ||^||p := [E(|^|p)]1/p < oo 
and let || ■ || = || ■ Ih- The symbols Op(l) and Op(l) signify being bounded in probability 
and convergence to zero in probability, respectively. Denote by \_a\ the integer part of a. 

The paper is organized as follows. Section [2] introduces the main idea of confidence 
interval construction for quantities that are functionals of the finite dimensional marginal 
distribution, describes its connection to the fixed-b approach and proposes a new test for 
non-correlation. Section |3] extends the applicability of our method to quantities that are 
functionals of the whole joint distribution of the series. In Section HI simulation results are 
presented to examine the finite sample performance of the new method in comparison with 
the standard and block bootstrap approaches. Section [5] concludes and technical details 
are relegated to the appendix. 

2 Methodology 

In this section, we confine our discussion to quantities that can be expressed as functionals 
of the ?7i-dimensional marginal distribution of {Xt)t^z, where ?7i is a fixed but arbitrary 
integer. In other words, let 6 = T{F„i), where Fm is the marginal distribution of Yt = 
{Xt, ■ ■ ■ ,Xt+rn-i)' and T is a functional that takes values in W^. Let N = n — m + 1 



and p^ = A^~^ ^i=i ^Yt be the empirical distribution, where 5y denotes the point mass at 
y G M™. A natural estimator of 6^ is ^at = T{p^). We shall focus on the class of statistics 
that are approximately linear in this section. For an approximately linear statistic T(p^), 
it admits the following expansion in a neighborhood of Em, i.e. 



N 



T{p^) = TiF„,) + N-' Y, IF{Yt- F„) + R^, (2) 



t=i 



where IF{Yt\ F^) is the influence function of T (Hampel, Ronchetti, Rousseeuw and Stahel 
1986) defined by 



JF(y;Fj=lim 
€4,0 



T{il-e)F^ + e5y}-TiF„ 



and R]y is the remainder term. For example, the {m — l)-th {m G N) lag autocovariance 
and autocorrelation, which are denoted by '-/{m — 1) = cov{Xo,Xm~i) and p(?Ti — 1) = 
'-/{m — l)/7(0) respectively, depend only on F^- Their sample estimates 



n— rra— 1 



7„(m-l) = (n-|m-iri ^ (X^ - X„)(Xt+|„_i| - X„), 



t=i 



where Xn = n~^ SfLi -^t^ ^^^ Pnijn — 1) = 7„(m — l)/7.„(0) are functionals of p^. Note 
that the commonly used estimates 7„(m — 1) = n~^ X]"=i ~ \^t — -^n)(-^t+|m-i| — -^n) 
and f)n{m — l) = 7„(?72 — 1)/7„(0) differ from 7„(?Ti — 1) and p„(?7i — 1) by a constant factor, 
and it is easy to see that these two definitions are asymptotically equivalent for a fixed m. 
See Kiinsch (1989) for more examples of approximately linear statistics, such as various 
location and scale estimators for the marginal distribution of Xi, von Mises statistics and 
M-estimators of time series models. 

Under expansion (j2]) and some regularity conditions that ensure the negligibility of R^, 
y/N{eM - e) ^D iV{0, S(F„)}, where 

oo 

S(F^)= Y. coY{IF{Yo;Fm)JF{Yk;F^)} 

k=—oo 

is the long-run variance matrix of the stationary process {IF(Yt; Fm)}tez- Equivalently, 
S(Fm) is the spectral density matrix of {IF{Yt; F^)} evaluated at zero frequency (up to 



a constant factor). As shown in the example of the median, E(Fm) could contain some 
nuisance parameters, which render consistent estimation of S(Fm) a difficult task. 

To motivate our proposal, we consider ^[rTvj = T{pm ) for r G (0, 1], which estimates 
6 on the basis of the subsample of first \rN\ observations of Yt, i.e. (Yi, ■ ■ ■ ,Y^rN])- 
Analogously to equation ([2]), we have 

[rN] 

Let P[0, 1] be the space of functions on [0, 1] which are right continuous and have left 
limits, endowed with the Skorokhod topology (Billingsley 1968). Denote by "^" weak 
convergence in ^^[0, 1]. Our method hinges on the following two assumptions. 

Assumption 2.1. Assume that K{IF{Yt] -F^)} = and 

[rN] 

7V-V2^j^(y^.^^)^A5^(^)^ (3) 

t=i 

where A is a g x g lower triangular matrix with nonnegative diagonal entries and Bq{-) is 
a g-dimensional vector of independent Brownian motions. Assume that A A' = S(i^m) is 
positive definite. 

Assumption 2.2. Assume that Rn = Op^N'^^"^) and A^"^ Yjt=i l^-^tP = Op{l). 

Let Wn = N-^ ^,^^ t\et - eN){Ot - On)'. Denote 

V,= [ {B,{r) - rBg{l)}{B,{r) - rS,(l)}'rfr and U, = Bg{l)'V-'B,{l). 
Jo 

The upper critical values of Uq have been tabulated by Lobato (2001) for g = 1, ■ ■ ■ , 20. 
Theorem 2.1. Under Assumptions \2.1\ and \2.^ we have that 

N{eN - eyw^\eN - e) ^d u^. (4) 

So a 100(1 — a)% confidence region for 9 is 

{9 : A^(^;v - 9yW^\9r, - 9) < t/,, J, 
where Ug^a is the 100(1 — a)th percentile of the distribution for Uq. 



Proof of Theorem 12.11 Under Assumption 12.11 and -R^v = Op[N ^/^), we have N^^'^{9n 

:UlF{Y,-F^)-{t/N)Y.l. 
write 



^D A5,(l). Let S,^(t) = Y!,=iIF{Y,-R^) - it/N)j:f^,IFiY,;Fm). Then we can 



m - 9n) = t{9t -9)- ^N{9n -9) = SjF{t) + (tRt - ^NrA , t = !,■■■ ,N, 

which imphes that Wn = N'"^ Tl,t=i ^iFit)SiFity + Op(l) under Assumption 12. 2[ It then 
follows from Assumption 12.11 and the continuous mapping theorem that Wn -^d AV"gA', 
which is joint with N^/'^{9]^ — 9) -^jj ABg{l). Since A is invertible, the stated result is 
obtained. (} 

Remark 2.1. Assumption 12. II is not primitive. To give primitive conditions, one can resort 
to mixing assumptions; see Phillips (1987), Assumption 2.1, which is originally due to 
Herrndorf (1984). Specifically, condition ([3]) holds if 

oo 

E\IF(Yt] Fm)\^ < oo, ^ ajr^^'^ < oo for some /3 > 2, 

fc=i 

where (afc)fceN stands for the strong mixing coefficients of (Xf). The mixing assumption 
is mild and it allows for a wide variety of time series models, such as finite order auto- 
regressive moving average models (Pham and Tran 1985), bilinear models (Pham 1986) 
and generalized auto-regressive conditional heteroscedasticity (GARCH) models (Carrasco 
and Chen 2002). Or one can impose the near-epoch dependence assumption, which allows 
processes that are not mixing; see Davidson (2002) for more details. Assumptions 12.21 
is also mild. Here we show that it is verifiable for the class of smooth function models. 
This class is sufficiently wide to include many statistics of practical interest, such as auto- 
covariance, auto-correlation, the Yule-Walker estimator and other interesting statistics 
in time series. Let fiz = lE(^t), where Zf is a multivariate stationary time series in 
MP, i.e. Zt = {Zf^,--- ,zf^)'. Let Z^ = N-^^J^^Zj and 9 = H{nz) e M. Then 
9n = H{Z]\r) and IF{Zt;Fm) = {Zt — iiz)'dH[iiz)/dz. By the mean-value theorem, 
Rn = {Zn — fizy{dH'^{ZN)/d'^z}{Zi\f — fiz), where Z^ = PZ^ + (1 — P)lJ'Z for some 
(3 G [0,1]. Assumption 12.21 holds provided that Wd"^ H {z) / d"^ z\\2 is bounded, Zj^ — jJ'Z = 
Op(iV-i/^) and N-^Y.^^it^¥.\{Zt - /iz)|^ = o(l), the latter two of which hold if Zt G £^ 
and E.,,..,.,6z|cum(4^°\z(fV-- ,4f )l < oo for any (po,--- ,P,) e {I,--- ,pp+i, 
j = 1, 2, 3. Here for a. p x p matrix, || A||2 denotes the matrix norm induced by the vector 
norm ||2;||2 = (^^=1 -^j)^^^- 
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By introducing a random normalization matrix Wn, which is proportional to S(Fm), 
our proposal avoids the thorny issue of estimating S(Fm) explicitly and our statistic is 
asymptotically pivotal. The idea of using random normalization is not new, and it has 
been applied by Lobato (2001) to the problem of a non- correlation test. However, the 
formulation in Lobato (2001) is tailored to the testing problem, whereas our method is 
developed under a more general framework. A distinctive feature of our method is that 
we use recursive estimates of the quantity of interest in the formation of the normalization 
matrix. In particular, the normalization matrix in Lobato (2001) is an explicit function 
of the cumulative sum (CUSUM) process corresponding to the sample autocovariances, 
whereas ours involves recursive estimates since the CUSUM process for the influence func- 
tion is typically unobserved and may involve unknown nuisance parameters. From the 



proof of Theorem 12. H we see that the use of recursive estimates allows us to express the 
CUSUM process on the basis of the influence function as the difference between the pro- 
cess {t{6t — 6'jv)}^i and a negligible remainder term. The use of recursive estimates in 
normalization (standardization) has also been considered by Kuan and Lee (2006) and Lee 
(2007). However, their discussions were restricted to the robust testing context, where the 
statistic of interest is a function of residuals and the use of recursive residuals can remove 
the estimation effect. As pointed out by a referee, the use of recursive estimates that are 
computed on a sequence of increasing subsamples of the series is related to the notion of 
"scanning" that was proposed in McEhoy and Politis (2007), in which a scan was defined as 
a collection of n block subsamples of the sequence of {Xi, ■ ■ ■ , Xn}, which contains n nested 
blocks, each of which has size k, for k = 1,- ■ ■ ,n. The one considered in our paper basi- 
cally corresponds to a forward scan, i.e., {{Xi), {Xi, X2), {Xi, X2, X3), ■ ■ ■ , (Xi, ■ ■ ■ , Xn)}. 
A natural question is whether the proposed method would work for other scans. It seems 
that other scans may work with suitable modification of the distribution theory but the 
practical gain is not clear. We leave this for future investigation. 

2.1 Self-normalization versus fixed-b approach 

In theory, we can replace Wn with any smooth functional of the process {['"A^J (^[rA^j — 
9N),r G (0,1]}. The asymptotic distribution of the resulting statistic is pivotal and its 
percentiles can be obtained from simulations. Our particular choice of Wn is somewhat 
arbitrary and is in part influenced by Lobato's (2001) proposal, which is closely linked 



to the fixed-b asymptotic scheme that was considered by Kiefer, Vogelsang and their co- 
authors (Kiefer et al. 2000; Kiefer and Vogelsang 2002a, 2002b, 2005; Bunzel, Kiefer and 
Vogelsang 2001). To elucidate their connections, we focus on the simple case 9 = E(Xi). 

In this case, 9t = t~^J2j=i-^j^ ^n = ''T'~'^Yl^=i\'l2j=ii-^j ~ ■^n)\ and expansion ([2]) 
holds with a vanishing remainder term. Equation @ reduces to n(X„ — Oy/Wn — j-d Ui, 
which has been discussed in section 2 of Lobato (2001). To construct a confidence interval 
(or to perform hypothesis testing) for E(Xi), a standard approach is to find a consistent 
estimate for lim„^oo{'^var(X„)} = Xlfc-oo^O)- "^^^ commonly used lag window estimate 
admits the form 

n-l 

E K{j/{bn)}%{j), 

j=-(n-l) 

where K{-) is the kernel function and bn is the bandwidth. In the standard asymptotic 
regime, the ratio of the bandwidth to the sample size 6 — > as n — )■ oo and the inference is 
based on the limiting normal or x^ distribution, whereas b G (0, 1] is held constant in the 
fixed-b asymptotics and the limiting distribution is nonstandard depending on the kernel 
function and b. Kiefer and Vogelsang (2002a) showed that 2Wn is equal to the lag window 
estimate when K{-) is the Bartlett kernel, i.e., K{x) = (1 — |x|)l(|x| < 1) and b = 1. 
Therefore, the self-normalized method proposed can be regarded as a special case of the 
fixed-b approach. It is also worth noting that the self-normalized approach differs from 
that used in the independent and identically distributed (iid) setting, where one typically 
normalizes with the sample variance (Lai, de la Pena and Shao 2009). For a stationary 
time series, the sample variance is no longer suitable as the normalization factor, since 
the long run variance of Xt (i.e., YlTL-oo'^U)) ^^ ^^^ nuisance parameter instead of the 
marginal variance oi Xt (i.e., 7(0)). 

Compared with the standard approach where the normalization (Studentization) factor 
is a consistent estimator of the asymptotic variance, the self-normalized approach adopts 
an inconsistent estimator as the normalization factor, which in a sense corresponds to 
'inefficient Studentizing'. In what follows, we describe important implications of the ineffi- 
cient Studentizing in terms of the size and power behaviors for hypothesis testing and the 
coverage accuracy for confidence interval construction. For the Gaussian location model. 



Jansson (2004) showed that 



sup 



W„, 



<x] -P{Ui<x] 



0{n-Mog(n)}, (5) 



which was further refined by Sun, Philhps and Jin (2008) under the fixed-b asymptotic 
framework by dropping the log(n) term. In the testing context, the imphcation of equation 
([5j) is that the self-normahzed approach controls the size better than the standard approach, 
where the corresponding error rejection rate (ERP) is no better than 0{n~^''^) (Velasco and 
Robinson 2001). For heuristic and theoretical explanations of the better size property of 
the self-normalized approach as compared with the standard approach, we refer the reader 
to Bunzel et al. (2001) and Sun et al. (2008). When testing for 7(1) = 0, Lobato (2001) 
showed that the local asymptotic power of the self-normalized approach is dominated by 
the standard approach. Also see Kiefer et al. (2000) for a similar finding in the context of 
robust testing for linear regression models with auto-correlated errors. The phenomenon of 
"better size but less power" corresponding to the self-normalized approach is also consistent 
with earlier Monte Carlo results in Kiefer et al. (2000), Kiefer and Vogelsang (2002b, 2005), 
Bunzel et al. (2001) and Lobato (2001). See Sun et al. (2008) for an interesting theoretical 
explanation of the phenomenon of "better size but less power" for the fixed-b approach 
(with the self- normalized approach as a special case) using the loss function argument. 

For confidence interval construction, the coverage probability corresponds to the size, so 
equation ([5]) implies that the coverage accuracy for the self-normalized approach is better 
than that offered by the standard approach, which is also confirmed in our simulation 
studies; see Section 14.31 The asymptotic analysis and simulation results in Kiefer and 
Vogelsang (2005) under the fixed-b framework suggest that for a given kernel function, 
6 = 1 corresponds to the least size distortion. This supports our choice 6 = 1 in confidence 
interval construction, since our self-normalized statistic is basically a special case of the 
fixed-b formulation with the Bartlett kernel and 6=1. It is possible and in fact quite 
straightforward to extend the fixed-b approach to the framework that is described in this 
section. We do not pursue this generalization as there is no additional methodological and 
technical difficulty in view of the argument that was used in Kiefer and Vogelsang (2002b, 
2005). We also note the work by Phillips, Sun and Jin (2004,2006,2007), who estimated 
the spectral density (or long-run variance) by exponentiating kernels with bandwidth equal 
to the sample size. They developed the so-called fixed-exponent asymptotics, which are 
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similar in spirit to the fixed-b asymptotics. 

2.2 A new test for non-correlation 

Owing to the duahty of confidence interval construction and hypothesis testing, we can ex- 
tend our method to the hypothesis testing context. For example, suppose we are interested 
in testing 

Hq : 7(1) = ■ ■ ■ = 7(m — 1) = versus Ha : 7(j) 7^ 0, for some j = 1, ■ ■ ■ , m — 1. 

For t = 1, ■ ■ ■ , iV, let Xi = (t+m-l)-' ^It?"' ^fc and 7t(j) = (t+m-l)-^ ES'"'"''' i^k- 
Xt){Xk+\j\ — Xt) be the estimates of IE(X() and 7(j) based on the subsample of first t ob- 
servations of {Yh}f^^i. Denote by Q(m - 1) = {7t(l), ■ ■ ■ ,lt{rn - 1)}', St = t{ct{m - 1) - 
CAr(m — 1)} for t = 1, ■ ■ ■ , iV, and Jm-i = N'"^ X]t=i StS'f Our test statistic is formed as 

fm-i = NcN{m - l)'J~\cN{m - 1). 

Rejection of hypothesis Hq occurs when Tm-i is too large, by reference to upper critical 
values of Um-i- Further let Z^t = {Xt — Xn){Xt+k — ^n) for k = 1, ■ ■ ■ ,m — 1, Zt = 
{Zu, ■■■ , Z(„„i)t)', St = Yl]=i{Zj-CN{rn-l)} and J^-i = N~^ J2tLi StS'f Then Lobato's 
test statistic is 

T,„_i = NcN{m - l)'J~iiCAr(m - 1), 

which has the same distributional limit as T^-i- The difference between Tjn~i and Tm~i 
lies in different forms of their normalization matrices; for example the recursive mean 
estimate Xt is used in Jm-i, whereas the sample mean Xn is used in Jm-i- The leads to 
the difference in their finite sample size and power performance, which will be elaborated 
in Section WA\ 



3 Theoretical extensions 

To broaden the applicability of our methodology, we shall consider constructing confidence 
intervals for quantities that are functional of Foo, i.e. the joint distribution of {Xt)t&z- 
To illustrate the idea, we shall first introduce the class of spectral mean that admits the 
form G{f,(f>) = Jq (f){\)f{\)dX, where /(■) is the spectral density function of {Xt)t&z and 
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</) : [— TT, tt] — 7- M is a symmetric function with bounded variation. A sample analogue of 
/(A) is the periodogram, /„(A) = (27rn)~^| Y^^=i{Xt — X„)e**'^p and a natural estimator 
of G{f,(f)) is G{In,4') = Jq 4>{\)In{^)dX. Often in practice, the quantity of interest is the 
normalized (ratio) version of G{f,(f)), i.e. R{f,(f)) = G{f,(f))/G{f,l), which is estimated 
by its sample counterpart R{In,4')- Prominent examples include 0(A) = 2cos(?7iA), ?7i G N 
and 0(A) = l[o,x-](A), x G [0,7r]. The former corresponds to G{f,(f)) = 7(772) and R{f,(f)) = 
p{m), which have been covered by the framework in Section [2l The latter corresponds 
to G{In,4') = Fn{x) = j^ In{\)d\ and R^In.cj)) = Fn{x) / Fn{7r) , which are ra^/^-consistent 
estimators of the spectral distribution function G{f, 0) = F{x) and its ratio counterpart 
R{f,(j)) = F{x)/F{ti). Note that both F{x) and F{x)/F{tt) are functionals of F^q. 

Denote by /4(-, ■, ■) the fourth-order cumulant spectral density of the process Xf. Under 
appropriate moment and weak dependence conditions (Brillinger 1969; Rosenblatt 1985; 
Dahlhaus 1985), we have 

n'^'{GiIn, 0) - Gif, 0)} -^D N{0, ^2(0)}, 

where o"^(0) = 2-11 \^f^ (f)'^{\)p{\)dX + J^ f^ (f){wi)(f){w2)f4:{wi,—Wi,—W2)dwidw2}- Con- 
fidence interval construction for G{f, 0) and R{f, 0) has been investigated by a few re- 
searchers. A standard approach is to find a consistent estimate of o''^{(f)) and apply the 
plug-in principle, which inevitably involves the estimation of the integral of the fourth order 
cumulant spectra (Taniguchi 1982; Keenan 1987; Chiu 1988). The procedures proposed all 
involve a choice of a smoothing parameter, for which no theoretical or empirical guidance 
has been given. Other works that avoid direct estimation include Dahlhaus and Janas 
(1996) and Kreiss and Paparoditis (2003) on the frequency domain bootstrap method 
(Franke and Hardle 1992) and Nordman and Lahiri (2006) on the empirical-likelihood- 
based approach. A limitation of these works is that their methodology heavily relies on 
the assumption that Xf is a linear process with independent and identically distributed 
errors and may not be valid for more general stationary processes. Recently, Shao (2009) 
proposed a self-normalization-based approach that is widely applicable to a large class of 
stationary process. It involves a bandwidth, which is chosen by using information crite- 
ria and a moving average sieve approximation. Simulation studies show reasonably good 
finite sample performance. However, a drawback of the method in Shao (2009) is that, 
for confidence intervals of i?(/, 0), there is a possibility that the method yields empty or 
no meaningful confidence intervals when the sample size is small. In contrast, the method 
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that is developed in this article always delivers meaningful and nonempty intervals, and 
also there is no need to choose any tuning parameters in our procedure. 

In what follows, we shall establish a theorem under a general framework that includes 
the (normalized) spectral mean as a special case. Suppose that the quantity of interest is 
9 = T{Foo) G IR"^ and its estimator is 6'„ = Tn{pD, where T„ is a functional of the n-th 
dimensional distribution of (^t)t6z and it takes value in W. Denote by 9t = Tt{p\) the 
estimator that is based on (Xi, ■ ■ ■ , Xj), t = 1, ■ ■ ■ , n. The following theorem shows that 
it is possible to extend the validity of our method described in Section [2] to a more general 
setting. The underlying idea is that if there are a sequence of approximating statistics for 
9ni that is a functional of the -Bn-th dimensional empirical distribution (i?„ can be fixed 
or grows with n; see Remark l3.ip . and a sequence of approximating quantities 9n for 9, 
then our method still delivers an (asymptotically) valid confidence interval provided that 
the approximation errors that are associated with the approximating statistics and 9n are 
asymptotically negligible and similar regularity conditions hold for the expansion of the 
approximating statistics around 9n- For the convenience of notation, let r„ = \rn\ for 
re (0,1]. 

Theorem 3.1. Assume that there are a sequence of positive integers Bn and a sequence of 
approximating quantities 9n, that is a functional of Fb„ and satisfies \9—9n\ = o{n~^'^). Let 
Ytn = {Xt-B„+i, ■ ■ ■ ;-^t)' o^nd p^j^ be the empirical distribution based on {Yin, ■ ■ ■ j^rnn)' 
for r G (0, 1]. Further assume the expansion 

r-n 

TbMbI) - ^n = r-' J2 IPi^tn, FbJ + Rr.n, 

t=l 

whereE{IF{Ytn;FBj} = 0, 

n-V2 5^/F(y,„;FBj^Ai?,(r) (6) 

for some lower triangular matrix A with AA' being positive definite. Suppose that 



n 



Rnn = Op{n ^/^) and n ^ V" \tRtn\'^ = Op{l); (7) 



t=i 



(zz), TbApIJ - I = Op{n-'/') andn~'J2\t{TBMJ-^^}\' = Mi)- 



t=i 
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Then n^e^-eyw-^ {6.^-6) ^d Ug, whereWn = n-^ ^^^i ^^(^t-^™)(^i-^n)'. Subsequently, 
the 100(1 - a)% confidence region for 6 is {6 : n(^„ - eyW-^On - 6) < Uq^^}- 

Proof of Theorem 13 -H Let 6t = Tb„{p^b )• Following the argument in the proof of Theorem 
[2Tl we can show that r2(^„-^„)W-i(^n-^n)' ^d Ug, where W^ = n-^ YJi.=i t\0t-0n){0t- 
On)' ■ The conclusion follows from our assumption ([8]) and \6 — 6n\ = o(n"^/^). 



Remark 3.1. If i?„ = m is fixed, then Theorem 13.11 generalizes Theorem 12.11 by allowing 
6n to be dependent on n, and On to be slightly different from the approximately linear 
statistic that was defined before. For example, if the quantity of interest is 7(m — 1), 
then Theorem 12. II is only applicable to the statistic 7„(??7, — 1), not to 7„(m — 1). With the 



formulation of Theorem 13. 11 we can let 9r„ = %„{m — l) for r„ > m, Tb„{p^b^) = Tm{Pm) = 
7r„(m — 1) and On = = 'j{m — 1). It is easy to verify that the technical assumptions 
in Theorem 13.11 hold under mild moment and weakly dependent conditions on Xf. The 
details are omitted. If i?„ — )■ oo as ra — )■ oo, then we typically require Bn/n — )■ as shown 
in the example of spectral mean below. 

To illustrate the verifiability of the assumptions in Theorem 13.11 we focus on the case 
for the spectral mean. For simplicity, we assume that ]E(X() = is known. The com- 
plication that is caused by the mean correction can be handled with additional routine 
technical details. Letting ip^ = {27i)^^ f^ (f){\)e^^^dX, then G{In,(t>) = ^fc=i_„7n(^)'^fc = 
Efc=(l %{k)gk, with gk = (ipk + ^-fc) ii k ^0, and go = ipo. Similarly, we have G{f, 0) = 
Er=o7(%fe- ^eten = Eto'lik)gk. Denote by 7,„(fc) = r'^ ^IIT'" ^*^*+|fc|- Then 

TbAPbD = T.to'lrM9k. IFiY,n,FsJ = j:to'{XtXt^k-lik)}g,, K,. = TJkJlrMdk 

and Rr^n = -r'^ ^k=o^ ELi ^t^t-fc^fc- 

In the case of the spectral mean, the functional central limit theorem ([6]) has been 
established in Theorem 1 of Shao (2009). The following proposition shows that the other 
two key assumptions ([7]) and (IE]) in Theorem 13.11 can be verified as well. 

Proposition 3.1. Assume that 1/Bn + Bn/n = o(l), J^JLid] < oo and Efcls,, |7(^)I = 
o(n~^/^). Further assume that 

^|7(A;)|<cx) and ^ \cum{Xo,Xkj^,Xk2,Xk.^)\ < oo. (9) 

Then r^E|TB^(p2',J ~ ^r„P = o{n) and r^E|_Rr„nP = o{n) uniformly in r & (0, 1]. 
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It is straightforward to see that the assumptions (^^ and ([H]) follow from the conclusion 
of Proposition EH Note that \e - ^„| = o{n-^/^) if T.T=b„ I7(^)I = o{n-^/^). Hence, 
the assumptions in Theorem 13.11 are all satisfied for the spectral mean. The case for the 
normalized spectral mean can be treated in a similar fashion. We omit the details. 

4 Finite sample performance 

Through Monte Carlo simulations, we investigate the size and power properties of the 
test statistic Tk in Section 14.11 the empirical coverage probabilities of confidence intervals 
for (normalized) spectral means, the median and unknown parameter vector in time se- 
ries models by using M-estimation in Section 14. 2[ We also compare the self-normalized 
approach with the standard and bootstrap approaches via simulations in Section 14. 3[ 

4.1 Size and power of Tk statistic 

We first investigate the size of Tk and compare it with Tx's of Lobato (2001) and Qx's 
of Lobato et al. (2002) at K = 1,3,5. Only the case K = 1 is examined for Tk in 
Lobato (2001). For the Qk test, it corresponds to efficient Studentization, so a bandwidth 
parameter is involved in the consistent estimation of the asymptotic covariance matrix of 
the first K sample correlations. Here we adopt an automatic procedure as used in Lobato 
et al. (2002), i.e., we employ the AR(1) prewhitening and selects the bandwidth by using 
formula (2.2) of Newey and West (1994) with weights equal to one and lag truncation equal 
to 2(n/100)^/^. Let ut stand for a sequence of iid standard normal random variables. For 
the comparison, we use the same models as studied in Lobato (2001). They are (1), iid 
N(0,1); (2), iid t(6); (3), demeaned standard log normal; (4), the 1-dependent process Xt = 
UfUt-i] (5), the heteroscedastic process Xf = StUtUt_i, where St is the infinite repetition of 
the sequence {1, 1, 1, 2, 3, 1, 1, 1, 1, 2, 4, 6}; (6), the uncorrelated non-martingale difference 
process Xt = Ut-2Ut-i{ut-2 + Ut + 1); (7), the GARCH(1,1) process Xt = UtCt, where 
af = 0.001 + 0.02X2_^ + 0.8a2_i; (8), the bilinear model Xt = Ut + 0.5ut_iXt_2. 

Two sample sizes n = 100 and n = 500 are investigated with 5000 replications. As seen 
from Table [H the size distortion increases as K increases, and it improves as we enlarge the 
sample size with the improvement almost being uniform over all the models and methods. 
The test statistic Tk tends to produce a higher size than Tk- When K = 3 and K = 5, Tk 
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is substantially under-sized for models (3)- (6) at n = 100, whereas the size distortion is 
noticeably less for Tk- In contrast, for models (1), (7) and (8), Tk is outperformed by Tk 
in terms of size distortion. A comparison of the size for Qk with that of T^ and T^ shows 
that the size performance of Qk is less satisfactory. The Qk test tends to be undersized 
and its size distortion appears to be very severe for some models (e.g. models (5) and 
(6)) a.t K = 3 and K = 5 even when n = 500. Since the size distortion is closely related 
to the bandwidth selection algorithm (Newey and West 1994), this seems to suggest that 
the particular data-driven bandwidth that we used here does not perform uniformly well 
across different models for a large sample size. 

Please insert Table [1] here! 
To investigate the power, we reconsider the models that were used in Lobato (2001), 
i.e., the AR(1) model with innovations following both the GARCH(1, 1) process that was 
specified in (7) above and the bilinear models in (8). The autoregressive coefficient p 
varies from 0.1 to 0.5 with a spacing of 0.1. The sample size is taken to be n = 100 and 
the number of replications is 5000. Table [2] shows the size-adjusted empirical rejection 
percentages for /C = 1,3,5 at 5% and 10% levels. In general, the larger K is, the lower 
the rejection rate becomes. For both models, the power of Tk is fairly close to that for 
Tk- As for Qk, its power advantage over Tk and Tk is pronounced when p = 0.2, 0.3, 0.4, 
but seems to diminish as K increases from K = 1 to K = 3 and K = 5. When p = 0.5, 
K = 3 01 K = 5, the power of Qk is close to or even slightly worse than that of Tk and 
Tk in some cases, which suggests some theoretical investigation. Overall, it seems fair to 
conclude that Qk has moderately more power but worse size than Tk and Tk, for which 
the size and power properties are comparable. This observation is consistent with the 
"better size but less power" of the self-normalized approach compared with its efficiently 
Studentized counterpart. 

Please insert Table [2] here! 

4.2 Spectral means, Median and M- estimators 

In this subsection, we first examine the coverage probability for spectral means 7(1) and 
F{tt/2) as well as their ratio counterparts p(l) and F{'k /2)/ F{'k). Let B be the backward 
shift operator, and Su and £2* be iid with A^(0, 1) and t(5) distributions respectively, 
^3i = Ut{Q.be\u^^ + 0.3}^/^ follows an ARCH(l) process. We follow the setup in Shao 
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(2009) and consider two sample sizes n = 150 and n = 600 and the following six models: 
Mi: {l-0.7B)Xt = su] M^: (l-0.75)Xt = 0.6^/^e2t; M3: il-0.7B)Xt = est/O-Q^^^; M,: 
X, = (1 + 0.8B)e,u M5: Xt = {l + 0.85){0.6V2£2i}; Mg: X, = (1 + 0.8B){e3t/0.6'/''}. In 
these models, the variances for t(5) and ARCH(l) processes are standardized to 1. The 
number of replications is 1000. As seen from Tables [3] and IH the coverages offered by our 
method are comparable with those delivered by Shao's (2009) approach for all models and 
sample sizes under consideration. The coverages of the intervals for spectral mean (e.g. 
7(1)) are farther from the nominal level than those for their ratio counterparts (e.g. p(l)), 
which is consistent with the finding in Shao (2009). For both p(l) and F{tx/2)/ F{tt), a 
portion of intervals are empty at n = 150 using the method in Shao (2009), whereas our 
approach always produces an [L, f/]-type non-empty interval. It is also worth noting that 
the coverages of the new method for models with ARCH errors are close to the nominal 
level when n = 600, suggesting that it is applicable to linear processes with dependent 
innovations. Since the new method is bandwidth free, has comparably good finite sample 
coverage and wide applicability, it seems preferable to Shao (2009). 

As suggested by a referee, we also include the coverage percentages for 7(1) and p(l) 
using the efficiently Studentized approach, which involves consistent estimation of the 
asymptotic variances of 7„,(1) and p„(l). In particular, we follow the idea that was pre- 
sented in Lobato et al. (2002) and slightly modify their procedure. Let 7 = (7„(0),7„,(1))', 
7 = (7(0), 7(1))', Wot = {Xt-Xn){Xt-Xn) and wu = (Xi-X„)(Xi_i-X„), t = 2, ■ ■ ■ ,n. 
Under suitable conditions, we have v^(7 ~ 7) ~^d N{0,V), where V^ is a 2 x 2 matrix 
with elements (Vqo, Vqi; Vio, V^ii)- We estimate V by applying the lag window method to 
wt^i = {wot,wuy for t = 2, ■ ■ ■ ,n, i.e., V = {n - 1)"^ Xlj Et-^(j70(^t - w){wt^j - w)\ 
where w = {n — 1)~^ X]r=i ^i- ^^ ^^^ ^^^ Bartlett kernel for K and the same bandwidth 
selection algorithm (Newey and West 1994) as adopted in Qk test. To estimate the asymp- 
totic variance of ^Jn{pn{^-) — p(l)}, we plug in the estimates for the unknown quantities in 
equation (1) of Lobato et al. (2002), and the resulting estimate is 7n(0)~^[Vii — Pn(l)Vio — 
p„(l)Voi + Pn(l)^Voo]- As seen from Table El the efficiently Studentized approach exhibits 
undercoverage in the case of 7(1) and its coverage is noticeably worse than those of the 
other two methods for all the models and sample sizes under consideration. For p(l), the 
efficiently Studentized approach delivers reasonably good coverage, with apparent over- 
coverage for models with t(5) errors; see the results for M2 and M5. It is not clear why the 
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coverage gets worse for some models (e.g., M2, M4, M5 ) when the sample size increases 
from 150 to 600. Nevertheless, the coverage performance of the self-normalized approach 
is at least not inferior to the efficiently Studentized approach. 

Please insert Tables [3iyi] here! 
In what follows, we further examine the coverage accuracy for med{Xi) on the basis 
of models Mi-Mg, and for the unknown parameter vector in the models M1-M3 and the 
following three AR(2) models using the least absolute deviation (LAD) estimates. Let Mj: 
{l-(j),B-<f)^B^)Xt = eu;Ms: {l-<f)iB -<f)2B^)Xt = O.Q'/^e^t; Mg: {l-<f),B -<f)2B^)Xt = 
£34/0.6^/^. The true value of (0i, ^2) = (0.6, 0.35). For models M7-M9, we estimate (0i, ^2) 
by (0in,02n) = ^T^&^'^^(<t>u<j>2) Th=3 \^t " 0i^t-i " 02-^t-2|- Table (jSji) shows that in the 
case of the median, there is undercoverage for all the models and sample sizes. The 
coverage is fairly close to the nominal level when n = 600. In addition, the difference in 
the models' innovation distribution does not seem to affect the coverage much. For the 
LAD estimates, overcoverage occurs for both sample sizes and all models, and there seems 
more distortion at the 90% level than at the 95% level. From Table (jSh"), the overcoverage 
appears more severe for models M7-M9 than for models M1-M3, which is because the 
asymptotic approximation tends to become worse when q (i.e. the number of unknown 
parameters) gets larger. 

Please insert Table [5] here! 

4.3 Block bootstrap, normal approximation and self-normalization 

The focus of this subsection is to compare the finite sample coverages of the self-normalized 
approach with the standard approach, where consistent estimation of the asymptotic vari- 
ance matrix is involved. We consider the AR(1) model, Xt = pXt^i + Ut, where Ut ~ 
iid A^(0, 1) and p = 0, 0.5 and 0.8. The sample size n = 50 and the number of boot- 
strap replications is 1000. We examine the empirical coverages of confidence intervals for 
K{Xi), med{Xi) and p(l) based on 2000 replications and for F{'k /2)/ Fiji) based on 500 
replications. For the linear regression models with auto-correlated errors, Goncalves and 
Vogelsang (2008) showed that the conventional block bootstrap test, where the formula 
for the bootstrapped standard error admits the same form as that used on the original 
data, could be more accurate than the standard normal approximation under the fixed-b 
asymptotic framework. Their simulation results also indicate that, when the block size 
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is suitably chosen, it may outperform the fixed-b approximation. In hght of the find- 
ings in Goncalves and Vogelsang (2008) in the testing context, we shall incorporate the 
block bootstrap method into our simulation studies. Let {^*}"=i denote the bootstrap 
sample with block size / = 1, ■ ■ ■ , 15. If n/l is not an integer, we use a fraction of the 
last sampled block. We compare the empirical coverages of the following four schemes 
at the 95% nominal level: (1), Moving block bootstrap without Studentizing. In other 
words, we approximate the sampling distribution of yN{9N — Q) by yN{6*j^ — 9j\[), where 
9}^ is the functional T applied to the bootstrap sample. So the 95% confidence interval 
of 9 is [9^ — Qn 0.975/ y^^^N — ^tv 0.025/ viV]) where g^„ denotes the 100a% percentile 
of yN{9*j^ — 9j\i) based on 1000 bootstrap replicates. (2), Normal approximation. To 
use standard normal approximation, we need a consistent estimate for the asymptotic 
variance. Here we use the block bootstrap variance estimator, which is denoted as a^, 
whose consistency has been shown in Kiinsch (1989) and Biihlmann and Kiinsch (1995) 
for a large class of approximately linear statistics. The resulting confidence interval for 
9 is [9^ — 1.9QSr/\^,9N + 1.9Qcr/\/N]. Note that other types of bandwidth-dependent 
consistent estimates are available, but it requires a case- by-case study. In contrast, the 
use of the block bootstrap method allows us to treat the consistent estimation of asymp- 
totic variances for all the cases in a unified way. In addition, the implementation is very 
straightforward. (3), Moving block bootstrap with inefficient Studentizing. In this scheme, 
we approximate the sampling distribution oi N{9iy — 9yW^ {9^ — 9) by its bootstrap coun- 
terpart A^(^^ - 9n)'{WIjY''{91j - 9n), where W^ is obtained by plugging the bootstrap 
sample into Wn- The 95% confidence interval for 9 is obtained by solving 

Ni9^-9yWi^\9N-9)<Ul,,,,, 

where U*oo5 stands for the 95% percentile of iV(^^ - ^7v)'(^^)~^(^^ - ^^) based on 1000 
bootstrap replicates. (4), The self-normalization-based approximation; compare Theo- 
rems 12.11 and 13. 1[ 

In Figures [THU we plot the empirical coverage probabilities and the ratio of the mean 
interval widths over that delivered by the self-normalized method for E(Xi), medlXi), 
p(l) and F(7r/2)/F(7r) respectively. The symbols "BB-Nostud", 'W(0,1)", "BB-Stud" 
and "Self- Norm" in the figures correspond to the schemes (l)-(4) that were described 
above. For both E(Xi) and medlXi), Figures [U and [2] show that all methods lead to 
undercover age. The coverages for the moving block bootstrap without Studentizing are 



comparable with the normal approximation for E(Xi), but are noticeably inferior to the 
normal approximation uniformly in the block sizes that were examined for med{Xi). The 
coverages for both methods deteriorate quickly as the correlation strengthens. For the 
moving block bootstrap with inefficient Studentizing, it shows very good coverages across 
the range of block sizes and outperforms the self-normalized method for all block sizes when 
p = 0.8. As far as the length of the intervals is concerned, the intervals corresponding to 
the normal approximation and moving block bootstrap without Studentizing are of similar 
widths and are shorter than the self-normalization-based intervals. This is consistent with 
the loss of the local asymptotic power for the self- normalized method (Lobato 2001), as 
a test statistic that corresponds to a wider interval tends to be less sensitive to the local 
alternatives. Also note that the intervals that are delivered by the moving block bootstrap 
with inefficient Studentizing are wider than those by the self-normalized method uniformly 
in the block sizes. 

Please insert Figures [1^42] here! 

From Figures [3] and m we see that the empirical coverages for p(l) and F{tt/2)/F{tt) 
that are delivered by the self-normalized method are fairly close to the nominal level. 
Although the bootstrap with inefficient Studentizing produces apparent overcoverage and 
very wide intervals with small block sizes, it is still possible to achieve a better coverage than 
the self-normalized method with suitably chosen block sizes. The normal approximation is 
again seen to be superior to the moving block bootstrap without Studentizing uniformly in 
the block sizes, although its coverages deviate from the nominal level when the block size 
increases. By contrast, the moving block bootstrap with inefficient Studentizing appears 
to be less sensitive to the choice of block size. 

Please insert Figures (SEdD here! 

For the inference of autocorrelations (e.g. p(l)), the moving block bootstrap method 
(without Studentizing), along with other nonparametric resampling methods, was advo- 
cated by Romano and Thombs (1996). Although it has been justified theoretically, the 
simulation results here suggest that the moving block bootstrap method without Studen- 
tizing is not a good choice owing to its poor coverage. The normal approximation, which 
involves the choice of block size in its variance estimation, is also not recommended because 
of its sensitivity to the block size selection and unsatisfactory coverages. In comparison, the 
self-normalized method has reasonably good coverage and is free of any tuning parameters. 
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In accordance with Goncalves and Vogelsang (2008), the block bootstrap (with inefficient 
Studentizing) can further improve the coverage of the self-normalized approach with suit- 
able choice of block length. However, to achieve this slight improvement in coverage, we 
need to pay a computational cost and the resulting interval tends to be wider. 

5 Conclusions 

In this article, a new approach is proposed to constructing confidence intervals (regions) for 
quantities in time series. The appealing features of the proposed self-normalized approach 
can be summarized as follows: (a) It is based on an asymptotically pivotal statistic and 
does not involve any user-chosen numbers; (b) It is easy to implement, since in general the 
calculation of recursive estimates requires only additional computation without the need 
to design any new algorithms; (c) It is broadly applicable to approximately linear statistics 
that are functionals of empirical distributions of fixed dimension and their asymptotically 
equivalent variants. Additionally, the theory can be extended to cover spectral mean and 
its normalized version, which are important quantities in time series. On the basis of the 
encouraging finite sample performance that was presented in Section H] and the above nice 
characteristics, we recommend this procedure to practitioners as a useful inference tool for 
routine use, such as obtaining the confidence interval of the lag 1 auto-correlation. 

Simulation results suggest that our tuning-parameter-free approach may be further 
improved by applying the moving block bootstrap to approximate the sampling distribution 
of the self-normalized statistic. However, the improved coverage over the self-normalized 
approach is not guaranteed and it critically depends on the choice of block size. It would 
be interesting to come up with a sound data-dependent block size selection rule. The early 
proposals by Hall et al. (1995) and Politis et al. (1999) on block length selection may still 
work for the current problem but need more investigation. In practice, if the user knows 
how to choose the data- dependent block size properly and can afford the computational 
cost that is associated with the block bootstrap method and the selection of block size, he 
or she is certainly encouraged to use the slightly wider bootstrap-based interval. In general, 
there are still grounds for recommending the self-normalized approach for its simplicity, 
convenience and reasonably good coverage. An interesting theoretical topic is to show 
that the block bootstrap can improve the self-normalization-based approximation in terms 
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of the ERP. This remains an open problem. In addition, the extension of this method 
to spatial settings is worthwhile but seems not straightforward for irregular spatial data. 
This is currently under investigation. 

6 Appendix 

Throughout the appendix, the positive constant C is generic and it may vary from line to 

line. 

Proof of Proposition 13.11 Note that 



r-n-l 



TlnTBS& - ^rf = rlve^T { Y. "irMak + r^ 



.k=B„ 



r-n-1 



Y, nirM}9k 



.k=B„ 



2 

= I + IL 



where the latter term /J is less than or equal to r^(^^!L5 |7(^) US'/el)^ ^ C^'^Cl2'h=B l7(^)l)^ 
o{n). Let Wtk = XtXt^k- Regarding term /, we have 

Y Yl ^^^3k + 2var ^ ^ WtkQk \=h + h, 

k = Br, t=l J \k = Br, t=l J 

where, by the argument that was used in the proof of Theorem 1 in Shao (2009), term /i 
is less than or equal to Cn Y1'^b„ d] ~ o(^)- Note that we have applied the fact that /(■) 
and /4(-, ■, ■) are both bounded under condition ([9]). Next, we write 

I2/2 = Y $^$^cov(Wifc,VFi/fc')^fc^fc' 

k,k'=Bn t=l t' = l 

= Y. YY.9k9k'Mt-t'Mt'-k'-t + k)+j{t'-k'-t)x 

k,k'=B„ t=l t' = l 

7(t' -t + k) + cum(Xo, X„fe, Xf^t, Xt'^k'^t)} = hi + I22 + h^- 
Let n^ = [— TT, tt]^ and Hk{\) = J2t=i ^**'*'- -^^^ term /21, we have that 



/» '71 n, r>, 

hi = E YT.9>^9k'e'^'-'^''f{X,)e^'^'-'-''^'^'^f{X2)dXrdX, 

"'n^ k,k'=B„ t=i t'=i 

= / Y 9ke"'^'Hk{-X^-X2)gk'e-''''^'Hk>{X^ + X2)f{X{)f{X2)dX^dX2. 



KtfK — ^^Tl 
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By the Cauchy-Schwarz inequality, we have that 

I/21I < C 
Summation by parts yields 



n2 



ik\2 



J2 gke'^^'Hkiw) 

k=B„ 



dwd\2. 



7'n Tn Tfi i K, 

k=B„ k=Bn k=Bn h=Bn 



and consequently, 



I/21I < C / \Hr„iw) 

'n2 



k=B„ 



ikX'2 



d\2dw + C 



n2 



h\2 



r„-l k 

J2 e*(^+i)"' Yl She'"'''' 

k = Bn h=Bn 



dX2dw 



< CnYal + c Yl 

k=B^ J^'k=B^ 



h = Br, 



9he' 



Ah\2 



d\2 <CnY 9k = oin). 



k=B„ 



By the same argument, we can show that term IJ22I = o{n). As for term J23, we have 

f'n ~~ J- 1^71 1^71 

I/23I < C sup gl Y y^y^ \cnm{Xo, X_k, Xf-t, Xt,^k'~t)\ 



k>B„ 



k,k'=B„ t=l t' = l 



< Cn sup gl Y \cnm{Xo,X_k,Xh,Xh^k')\ = o{n) 



k>B 



k,k',heZ 



under condition Q. Therefore, term |/| = o{n) and r^E|TB^(p^ ) — ^r„P = o{n). Finally, 
we note that 

B„~l k k' 



<nRr„n\ 



E E E 9k'gkCov{Wtk, Wt'k') 



k,k'=0 t=l t'=l 
Bn-l k k' 



= E E E 9k'gkMt - t')j{t' + k'-t-k) 

k,k'=0 t=l t' = l 

+7(t' + k'- t)7(t' -t-k) + cum(Xo, Xfc, X^.t, Xf+k'-t)}- 

Applying the same argument as used in term /21, we can derive that the first two terms in 
the preceding display are 0{Bn) = o(n), and the last term is bounded by 

B„~l k k' 

c Y EEi^^^(^0'^'^'^*'-*'^*'+fc'-*)i = ^(^") = ^(^)- 

k,k'=Q t=l t'=l 
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It is easy to see that the above bound holds uniformly in r G (0, 1]. This completes the 
proof. 
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Table 1: Empirical rejection percentages for the eight models at 5% and 10% levels when 
(a) n = 100 and (b) n = 500. The rows (i), (ii) and (iii) correspond to the results for Tk, 
Tk and Qk respectively. The number of replications is 5000. The largest standard error 
is 0.53%. 
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Table 2: (a) Size-adjusted power (in percent) under the alternative AR(1)-GARCH(1, 1) 
(shown in (a)) and the AR(1) with innovations following the bilinear model (8) (shown in 
(b)) at 5% and 10% levels. The rows (i), (ii) and (iii) correspond to the results for Tk, 
Tk and Qk respectively. The number of replications is 5000. The largest standard error 
is 0.71%. 
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12.0 


20.9 


34.2 


47.5 


58.4 






(iii) 


15.0 


30.0 


48.2 


59.0 


62.2 


12.0 
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50.0 
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25.0 


40.2 


51.8 
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11.9 


19.9 


31.7 


42.6 


50.9 
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Table 3: Percentages of coverage for the confidence interval for 7(1) (shown in (a)) and 
F(7r/2) (shown in (b)) under models Mi-Mg. The number on the left-hand side of each 
column is the coverage percentage obtained from the method in Shao (2009), the number 
in the square brackets stands for the coverage percentage delivered by the self-normalized 
method proposed in this paper, and the number in the curly brackets represents the cover- 
age percentage for the efficiently Studentized method. The number of replications is 1000. 
The largest standard error is 1.42%. 
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93.8 [91.3] {90.0} 


Me 
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95% 
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Ml 
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84.5 [86.0] 
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M3 
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92.1 [93.0] 
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Ms 
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88.7 [88.9] 
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92.0 [91.8] 


Me 
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84.2 [84.8] 


83.4 [84.2] 


89.4 [89.9] 
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Table 4: Coverages for the confidence interval of p(l) (shown in (a)) and F {ji / 2) / F {ji) 
(shown in (b)) under models Mi-Mg. The number on the left-hand side of each column is 
the coverage percentage obtained from the method in Shao (2009); the number in paren- 
theses is the percentage that produces an empty interval by Shao's (2009) method; the 
number in the square brackets stands for the coverage percentage delivered by the self- 
normalized method that is proposed in this paper; the number in braces represents the 
coverage percentage for the efficiently Studentized method. The number of replications is 
1000. The largest standard error is 1.13%. 
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n = 150 
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100(1 - a)% 


90% 


95% 


90% 


95% 


Ml 


85.1 (0.0) [90.4] {89.5} 


92.2 (0.0) [95.6] {93.8} 


88.8 (0.0) [89.7] {93.6} 


94.7 (0.0) [94.9] {96.0} 


Ma 


85.7 (0.6) [91.9] {93.6} 


90.6 (1.1) [95.4] {95.8} 


91.4 (0.0) [91.5] {96.8} 


95.3 (0.0) [96.4] {98.5} 


M3 


86.6 (0.5) [87.5] {88.3} 


89.8 (2.6) [93.2] {92.2} 


89.1 (0.0) [88.1] {93.7} 


93.6 (0.0) [93.2] {96.2} 


M4 


89.0 (0.0) [87.9] {90.8} 


94.1 (0.0) [95.1] {94.3} 


89.1 (0.0) [89.3] {94.5} 


94.0 (0.0) [94.4] {96.7} 


Ms 


90.6 (0.3) [88.7] {96.2} 


94.1 (1.5) [93.7] {97.8} 


88.4 (0.0) [89.8] {99.1} 


93.6 (0.1) [94.7] {99.7} 


J\/6 


89.8 (0.6) [86.1] {88.1} 


93.0 (2.0) [90.2] {92.1} 


89.8 (0.0) [87.6] {91.4} 


95.3 (0.1) [93.3] {94.9} 
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89.5 (0.0) [89.3] 
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Ms 
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Afe 


91.1 (0.6) [88.7] 


93.0 (2.3) [93.2] 


90.8 (0.0) [88.4] 


95.7 (0.1) [93.3] 



30 



Table 5: (a) Coverages for the confidence interval for the median of Xi out of 10000 
replications under models Mi-Mg. The largest standard error is 0.34%. (b) Coverages 
for the confidence interval of autoregressive coefficients based on the LAD regression for 
models M1-M3 and M7-M9. The number of replications is 1000. The largest standard 
error is 1.00%. 
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Figure 1: Empirical coverage probabilities (left panel) and ratios of the interval widths 
over that delivered by the self-normalized method (right panel) for E(Xi). Sample size 
n = 50 and number of replications is 2000. 
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Figure 2: Empirical coverage probabilities (left panel) and ratios of the interval widths 
over that delivered by the self- normalized method (right panel) for med{Xi). Sample size 
n = 50 and number of replications is 2000. 
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Figure 3: Empirical coverage probabilities (left panel) and ratios of the interval widths over 
that delivered by the self-normalized method (right panel) for p(l). Sample size n = 50 
and number of replications is 2000. 
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Figure 4: Empirical coverage probabilities (left panel) and ratios of the interval widths 
over that delivered by the self-normalized method (right panel) for F{'k /2)/ F{ti). Sample 
size n = 50 and number of replications is 500. 
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